A statistical approach to crosslingual natural language tasks

نویسندگان

David Pinto

Jorge Civera

Alfons Juan-Císcar

Paolo Rosso

Alberto Barrón-Cedeño

چکیده

The existence of huge volumes of documents written in multiple languages in Internet lead to investigate novel approaches to deal with information of this kind. We propose to use a statistical approach in order to tackle the problem of dealing with crosslingual natural language tasks. In particular, we apply the IBM alignment model 1 with the aim of obtaining a statistical bilingual dictionary which may further be used in order to approximate the relatedness probability of two given documents (written in different languages). The experimental results sucessfully obtained in three different tasks –text classification, information retrieval and plagiarism analysis– highlight the benefit of using the presented statistical approach.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

ITRI-03-13 CROCODIAL: Crosslingual Computer-mediated Dialogue

We describe a novel approach to crosslingual dialogue which allows for highly accurate communication of semantically complex content. The approach is introduced through an application in a B2B scenario. We are currently building a browser-based prototype for this scenario. The core technology underlying the approach is natural language generation. We also discuss how the proposed approach can c...

متن کامل

CROCODIAL: Crosslingual Computer-mediated Dialogue

متن کامل

Inducing Crosslingual Distributed Representations of Words

Distributed representations of words have proven extremely useful in numerous natural language processing tasks. Their appeal is that they can help alleviate data sparsity problems common to supervised learning. Methods for inducing these representations require only unlabeled language data, which are plentiful for many natural languages. In this work, we induce distributed representations for ...

متن کامل

Crosslingual Distributed Representations of Words

متن کامل

Borrowing Language Resources for Development of Automatic Speech Recognition for Low- and Middle-Density Languages

In this paper we describe an approach that both creates crosslingual acoustic monophone model sets for speech recognition tasks and objectively predicts their performance without target-language speech data or acoustic measurement techniques. This strategy is based on a series of linguistic metrics characterizing the articulatory phonetic and phonological distances of target-language phonemes f...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

J. Algorithms

دوره 64 شماره

صفحات -

تاریخ انتشار 2008

A statistical approach to crosslingual natural language tasks

نویسندگان

چکیده

منابع مشابه

ITRI-03-13 CROCODIAL: Crosslingual Computer-mediated Dialogue

CROCODIAL: Crosslingual Computer-mediated Dialogue

Inducing Crosslingual Distributed Representations of Words

Crosslingual Distributed Representations of Words

Borrowing Language Resources for Development of Automatic Speech Recognition for Low- and Middle-Density Languages

عنوان ژورنال:

اشتراک گذاری